A Survey on Keyword Diversification Over XML Data

نویسنده

  • Mohammed Koya
چکیده

Keyword queries are those terms that users enter and use to retrieve documents that have all or any of those terms. They are the most familiar and popular method used by ordinary users to search data. Keyword queries are highly ambiguous. Keyword search querying has emerged as one of the most effective way for information discovery, especially over HTML documents in the World Wide Web. Because of its simplicity keyword queries are one of the most effective ways for information discovery in World Wide Web. When using Keyword queries users do not have to learn a complex query language, nor needs him to have any prior knowledge about the structure of the underlying data. When considering the keyword query interpretations, a single keyword query interpretation will not be sufficient to satisfy the user. Many interpretations may yield unnecessary results. This will lead to user’s dissatisfaction. Diversification is mainly meant to minimize user’s dissatisfaction. The keyword query should return the major possible interpretations for the keywords in the underlying database. This will enable the user to easily select the intended interpretation. In the case of information retrieval (IR) keyword queries always retrieve a list of relevant documents and it needs to be analyzed manually one by one. But keyword queries over structured data give a more direct and effective way of diversification. In the case of keyword search over structured or semi-structured database if a keyword comes as value of more than one attribute, then each occurrence can be taken as different interpretations. Each interpretation will yield different results. Keyword Query Diversification can be defined as for a given keyword query over the XML dataset, the user should get a result set of top-k results, where each result should be relevant to the given keyword query and they should be maximum different to each other [1]. Diversification is done based on the underlying data that is to be searched. There are many approaches for keyword query diversification over XML data. These approaches derive different search intentions for each keyword in the query. This paper presents a comparative study on these different approaches. And also identifies context-based keyword diversification is the most effective method for keyword query diversification which will automatically diversifies the given keyword query based on different contexts of the keywords in the XML data. The method is to extract feature terms for each keyword in the given keyword query. And to generate query candidates of high relevance and novelty. Then search results are retrieved for those query candidates. This method can efficiently retrieve relevant search results, the method works effectively with purely text based data set. The existing method cannot retrieve images and text formats like italics and bold as such. The proposed system will overcome this drawback by considering images and text formats while diversification. It can also provide effective query suggestions to users.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Algorithms for Keyword Search on Graph Data

In this chapter, we survey methods that perform keyword search on graph data. Keyword search provides a simple but user-friendly interface to retrieve information from complicated data structures. Since many real life datasets are represented by trees and graphs, keyword search has become an attractive mechanism for data of a variety of types. In this survey, we discuss methods of keyword searc...

متن کامل

Path-based keyword search over XML streams

Recently, a great deal of attention has been focusing on processing keyword search over XML and XML streams. The keyword search is simple and provides a user-friendly way of retrieving required data from an XML data. Though its popularity, there is a concern over its efficiency. For this reason, several methods have been proposed to enable keyword search over XML streams. However, most of them ...

متن کامل

From Revisiting the LCA-based Approach to a New Semantics-based Approach for XML Keyword Search

Most keyword search approaches for data-centric XML documents are based on the computation of Lowest Common Ancestors (LCA), such as SLCA and MLCA. In this paper, we show that the LCA is not always a correct search model for processing keyword queries over general XML data. In particular, when an XML database contains relationships among objects, which is quite common in practical data, LCA-bas...

متن کامل

Processing XML Keyword Search by Constructing Effective Structured Queries

Recently, keyword search has attracted a great deal of attention in XML database. It is hard to directly improve the relevancy of XML keyword search because lots of keyword-matched nodes may not contribute to the results. To address this challenge, in this paper we design an adaptive XML keyword search approach, called XBridge, that can derive the semantics of a keyword query and generate a set...

متن کامل

Interactive Fuzzy based Search over XML Data for Optimized Performance

In a traditional keyword-search system over XML data, a user composes a keyword query, submits it to the system, and retrieves relevant answers. In the case where the user has limited knowledge about the data, often the user feels “left in the dark” when issuing queries, and has to use a try-and-see approach for finding information. In this paper we study, TASX Type-Ahead Search in XML data, a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017